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Abstract 



Independent component analysis (ICA) is linked up with the problem of estimating a non 
linear functional of a density for which optimal estimators are well known. The precision 
of ICA is analyzed from the viewpoint of functional spaces in the wavelet framework. 
In particular, it is shown that, under Besov smoothness conditions, parametric rate of 
convergence is achieved by a U-statistic estimator of the wavelet ICA contrast, while the 
previously introduced plug-in estimator Cj, with moderate computational cost, has a rate 

-4s 

in n . 
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1. Introduction 



In signal processing, blind source separation consists in the identification of analogical, 
independent signals mixed by a black-box device. In psychometrics, one has the notion of 
structural latent variable whose mixed effects are only measurable through series of tests ; 
an example are the Big Five identified from factorial analysis by researchers in the domain 
of personality evaluation (Roch, 1995). Other application fields such as digital imaging, bio 
medicine, finance and econometrics also use models aiming to recover hidden independent 
factors from observation. Independent component analysis (ICA) is one such tool ; it can 
be seen as an extension of principal component analysis, in that it goes beyond a simple 
linear decorrelation only satisfactory for a normal distribution ; or as a complement, since 
its application is precisely pointless under the assumption of normality. 



Papers on ICA are found in the fields of signal processing, neural networks, statistics and 
information theory. Comon (1994) defined the concept of ICA as maximizing the degree 
of statistical independence among outputs using contrast functions approximated by the 
Edgeworth expansion of the Kullback-Leibler divergence. 

The model is usually stated as follows : let A" be a random variable on R"^, d > 2 ; find 
pairs {A,S), such that X = AS, where A is a square invertible matrix and S a latent 
random variable whose components are mutually independent. This is usually done by 



minimizing some contrast function that cancels out if, and only if, the components of WX 
arc independent, where W is a candidate for the inversion of A. 

Matrix A is identifiable up to a scaling matrix and a permutation matric if and only if S 
has at most one Gaussian component (Comon, 1994). 

Maximum-likelihood methods and contrast functions based on mutual information or other 
divergence measures between densities are commonly employed. Bell and Snejowski (1990s) 
published an approach based on the Infomax principle. Cardoso (1999) used higher-order 
cumulant tensors, which led to the Jade algorithm, Miller and Fisher III (2003) proposed 
a contrast based on a spacing estimates of entropy. Bach and Jordan (2002) proposed a 
contrast function based on canonical correlations in a reproducing kernel Hilbert space. 
Similarly, Gretton et al (2003) proposed kernel covariance and kernel mutual information 
contrast functions. Tsybakov and Samarov (2002) proposed a method of direct estimation 
of A, based on nonparametric estimates of matrix functional using the gradient of /a- 

Let / be the density of the latent variable S relative to Lebesgue measure, assuming it 
exists. The observed variable X = AS has the density /a, given by 

fA{x) = \detA-^\fiA-^x) 

= \detB\f\hx)...f'^{bax), 

where be is the £th row of the matrix B = A~^ ; this resulting from a change of variable 
if the latent density / is equal to the product of its marginals . . . Z"^. In this regard, 
latent variable S = {S^, . . . , S"*) having independent components means independence of the 
random variables o tt^ defined on some product probability space = Yl^l^, with tt^ the 
canonical projections. So S can be defined as the compound of the unrelated S^,...,S'^ 
sources. 

In the ICA model expressed this way, both / and A are unknown, and the data consists in 
a random sample of /a- The semi-parametric case corresponds to / left unspecified, except 
for general regularity assumptions. 

In this paper, we consider the exact contrast provided by the factorization measure 
/ I/a — /ll^ J with the product of the marginals of /a- Let's mention that the idea of 
comparing in the L2 norm a joint density with the product of its marginals, can be traced 
back to Rosenblatt (1975). 

Estimation of a quadratic functional 

The problem of estimating nonlinear functionals of a density has been widely studied. In 
estimating / p under Holder smoothness conditions, Bickel and Ritov (1988) have shown 
that parametric rate is achievable for a regularity s > 1/4, whereas when s < 1/4, minimax 
rates of convergence under mean squared error are of the order of n~^*/^+''*. This result 
has been extended to general functionals of a density / 4>{f) by Birge and Massart (1995). 
Laurent (1996) has built efficient estimates for s > 1/4. 

Let Pj be the projection operator on a multiresolution analysis (MRA) at level j, with 
scaling function (p, and let ajk = J fifjk be the coordinate fc of /. 

In the wavelet setting, given a sample X = {Xi, . . . ,Xn} of a density / defined on M, 
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independent and identically distributed, the U-statistic 

^^^^^ = ^^hi) ^ E^^-^(^n)^i^(^^2) 

with mean / {Pjf)^ is the usual optimal estimator of the quantity / p ; see Kerkyacharian 
and Picard (1996), and Tribouley (2000) for the white noise model with adaptive rules. 

In what follows, this result is implicitly extended to d dimensions using a tensorial 
wavelet basis ^jk, with ^jk{x) = Lpjk^(x^) . . .Lpjkd{x^), k e e M'' ; that is to say 

with X an independent, identically distributed sample of a density / on M'', the U- 
statistic = -^^Y.^,<^,Y.ke1''^ok{X^,)<^Jk{Xi^) with mean j{Pjff = EkeZ^^^h is 

also optimal in estimating the quantity /j^d p . 

In the case of a compactly supported density /, _B| is computable with a Daubechies 
wavelet D2N and dyadic approximation of X, but the computational cost is basically in 
0{'n?{2N - lY), which is generally too high in practice. 

On the other hand, the plug-in, biased, estimator i?|(/) = Ylik S ^j*:(^«)] ^ = J2k^% 
enjoys both ease of computation and ease of transitions between resolutions through discrete 
wavelet transform (DWT), since it builds upon a preliminary estimation of all individual 
wavelet coordinates of / on the projection space at level j, that is to say a full density 
estimation. In this setting it is just as easy to compute J2k l"ifcl^ P > 1 or even 

sup \ajk\, with a fixed computational cost in 0{n{2N - l)**) plus sum total, or seek out the 
max, of a 2^'^ array. 

Both estimators Hj and B| build on the same kernel hj{x,y) = Eke'L'^ ^ 3k{x)^ jk{y) since 
they are written 

H]{X) = (n^)-! hj{X,.,Xi2) and B]{X) = {Al)-' ^ /i,(X,i,X,2), 

where, here and in the sequel, f]™ = {(i\ . . . ,i™):i^ eN,l<i^ < n}, = [i & 7^ 4 ^ 

i^i 7^ i^^} and = n\/{n-p)\. 

The plug- in estimator Hj is then identified as the Von Miscs statistic associated to Bj . In 
estimating J2k mean squared error of unbiased Bj is merely its variance, while the 

mean squared error of Hf adds a squared component E{Hj - BjY because of the inequality 
{H] - Ek < - Bjf + 2{Bj - Ek 0^%?- 

Prom general results, a U-statistic with finite second raw moment has a variance in Cn~^ and 
under similar conditions, the difference E\U — VY between the U-statistic and its associated 
Von Mises statistic is of the order of n"'' (See for instance Serfling, 1980). 

In the wavelet case, the dependence of the statistics on the resolution j calls for special 
treatment in computing these two quantities. This special computation, taking j and other 
properties of wavelets into account, constitutes the main topic of the paper. In particular 
whether 2^"^ is lower than n or not is a critical threshold for resolution parameter j. Moreover, 
on the set {j: 2^'^ > n^}, the statistic Bj, and therefore also Hj, have a mean squared error 
not converging to zero. 

If Bj and Hj share some features in estimating J2k ^'jk ~ I iPjf)^^ they differ in an essential 
way : the kernel hj is averaged in one case over fi^, the set of unconstrained indexes, and 
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in the other case over 7,^ the set of distinct indexes. As a consequence, it is shown in the 
sequel that Hj has mean squared error of the order of which makes it inoperable 

as soon as 2^^ > n, while B| has mean squared error of the order of 2^'^n~'^, which is then 
parametric on the set {j:2^'^ < n}. In a general way, this same parallel versus 7™ is 
underpinning most of the proofs presented throughout the paper. 

Wavelet ICA 

Let / be the latent density in the semi-parametric model introduced above. Let /a be the 
mixed density and let f\ be the product of the marginals of /a- 

Assume, as regularity condition, that / belongs to a Besov class Bs2co- It has been checked 
in previous work (Barbedor, 2005) that and f\, hence fA-fX belong to the same Besov 
space than /. 

As usual, the very definition of Besov spaces (here Bs2oa) and an orthogonality property of 
the projection spaces Vj and Wj entails the relation 

Q< j{fA- flf - I [PjifA - flf < C2-^^'. 

In this relation, the quantity J[Pj{fA - /a)]^ is recognized as the wavelet ICA contrast 
Cj (/a - /A)) introduced in a preliminary paper (Barbedor, 2005). 

The wavelet ICA contrast is then a factorization measure with bias, in the sense that a 
zero contrast implies independence of the projected densities, and that independence in 
projection transfers to original densities up to some bias 2~^J*. 

Assume for a moment that the difference Ja - is a density and that we dispose of an 
independent, identically distributed sample S of this difference. Computing the estimators 
B'j{S) or Hj{S) provides an estimation of / {/a — /l)^, the exact ICA factorization measure. 
In this case, the j* realizing the best compromise between the mean squared error in Cj 
estimation and the bias of the ICA wavelet contrast 2^-^^'^, is exactly the same as the one 
to minimize the overall risk in estimating the quadratic functional / {Ja — fX)^- It is found 
by balancing bias and variance, a standard procedure in nonparametric estimation. From 
what was said above Bj {S) would be an optimal estimator of the exact factorization measure 

lifA-nr. 

The previous assumption being heuristic only, and since, in ICA, the data at hand is a 
random sample of Ja and not fA-fXi l^ad to consider estimators diflferent from Bj 

and Hj, but still alike in some way. 

Indeed, let 6jk = f ifA-fA)^jk be the coordinate of the difference function fA-fX- In the ICA 

context, Sjk is estimable only through the difference {ajk — ctjk^ ■ . . ajk'') where ajk = J fA^jk 
is the coordinate of Ja and ajkt = / fA^jk' refers to the coordinate of marginal number i 
of fA, written fX^. 

To estimate Yl,k^%^ estimators of the type 7?| and 77| are not alone enough. Instead we 
use the already introduced wavelet contrast estimator (plug-in), CjiX) = J2k(^jk^,---k'' " 
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ajki ■ ■ .djTjd)^, and the corresponding U-statistic estimator of order 2d + 2, 

(^) = lis E E i'^i" (^^^ ) - '^ifc^ (^^ ) • • • ^ifc'' (^'''+^ )] 

with as above = . . e N, 1 < < n, i^^ ^ if 4 ^ ^2} and referring to 

the dimension £ of X G R'^. 

As it turns out, the U-statistic estimator D| computed on the full sample X is slightly 
suboptimal, compared to the rate of a S| in estimating a bare quadratic functional. 

As an alternative to Z)^(X), we are then led to consider various U-statistic and plug-in 
estimators based on splits of the full sample, which seems the only way to find back the 
well-known optimal convergence rate of the estimation of quadratic functional, for reasons 
that will be explained in the course of the proofs. 

These additional estimators and conditions of use, together with the full sample estimators 
C'j and are presented in section 3. 

Section 2 of the paper recalls some essential definitions for the convenience of the reader 
not familiar with wavelets and Besov spaces, and may be skipped. 

Section 4 is all devoted to the computation of a risk bound for the diflferent estimators 
presented in section 3. 

We refer the reader to a preliminary paper on ICA by wavelets (Barbedor, 2005) which 
contains numerical simulations, details on the implementation of the wavelet contrast 
estimator and other practical considerations not repeated here. Note that this paper gives 
an improved convergence rate in C2^'^n~'^ for the wavelet contrast estimator Cj, already 
introduced in the preliminary paper. 



1.1 Notations 

We set here general notations and recall some definitions for the convenience of ICA 
specialists. The reader already familiar with wavelets and Besov spaces can skip this part. 

■ Wavelets 

Let ip be some function of L2(R) such that the family of translates {ip{. - k), k € Z} is an 
orthonormal system ; let Vj c L2{M.) be the subspace spanned by {(pjk = 2^l'^ip{2K-k), k e Z}. 

By definition, the sequence of spaces {Vj),j e Z, is called a multiresolution analysis (MRA) 
of L2(IR) if Vj c Vj+i and Uj>o ^0 dense in L2(R) ; <p is called the father wavelet or scaling 
function. 

Let {Vj)j^i be a multiresolution analysis of £2(R), with Vj spanned by {ipjk = 2^/^(^(2^. - 
k), k g Z}. Define Wj as the complement of Vj in V,+i, and let the families {ipjk,k e Z} be a 
basis for Wj, with ipjk{x) = 2^/'^t{j{2^x - k). Let ajk{f) =< f,<fjk > and Pjkif) =< f,i>jk >■ 
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A function / e -^2(1^) admits a wavelet expansion on {Vj)j^z if the series 

oc 

k 3=30 k 

is convergent to / in L2(IR) ; ^ is called a mother wavelet. 

A MRA in dimension one also induces an associated MRA in dimension d, using the tensorial 
product procedure below. 

Define as the tensorial product of d copies of Vj. The increasing sequence {V^)j^z defines 
a multiresolution analysis of L2(R'^) (Meyer, 1997) : 

- for (i^ i'^) e {0, 1}'' and {i^ . . . ^ {Q . . . , 0), define 

d 

with = ip, ^/j'^) = ij), so that V' appears at least once in the product ^{x) (we now on 
omit in the notation for ^, and in (2), although it is present each time) ; 

- for (ii . . . , i'*) = (0 . . . , 0), define $(x) = nti ^{A \ 

- for i G Z, A; e Z**, a; e M**, let ^jk{x) = 2^^{2^x - k) and ^jk{x) = 2^$(2-'a; - k) ; 

- define Wf as the orthogonal complement of in T^^^ ; it is an orthogonal sum of 2"^ - 1 
spaces having the form U\j ... $5 Udj , where [/ is a placeholder for y or ; V oi W are 
thus placed using up all permutations, but with W represented at least once, so that a 
fraction of the overall innovation brought by the finer resolution j + 1 is always present in 
the tensorial product. 

A function / admits a wavelet expansion on the basis ($, \1/) if the series 

oo 

E "^ofe(/)^iofe + E E M)^3k (2) 

fceZ'' 3=30 fcez<* 

is convergent to / in L2{^'^). 

In connection with function approximation, wavelets can be viewed as falling in the category 
of orthogonal series methods, or also in the category of kernel methods. 

The approximation at level j of a function / that admits a multiresolution expansion is the 
orthogonal projection Pjf of / onto Vj c L2{^'^) defined by 

{Pjf){x) = E ajk^3k{x), 
where Ujk = cxjk\..^k-' = I f{x)^jk{x) dx. 
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With a concentration condition verified for compactly supported wavelets, the projection 
operator can also be written 



{Pjf){x)= [ Kj{x,y)f{y)diy), 



with Kj{x,y) = '^^'^J^keZ'' ^jk{x)^jk{y)- is an orthogonal projection kernel with window 
2~i'^ (which is not translation invariant). 

Besov spaces 

Bcsov spaces admit a characterization in terms of wavelet coefficients, which makes them 
intrinsically connected to the analysis of curves via wavelet techniques. 

/ G Lp{R'^) belongs to the (inhomogeneous) Besov space Bspq{R'^) if 



Jspqif) = \\ao.\\e^ + 



E[2''2*(i-i)||/3,,||,^' 



< oo. 



-j>o 

with s > 0, 1 <p< 00, I <q<oo, and ipjij^ € C"",r > s (Meyer, 1997). 

Let Pj be the projection operator on Vj and let Dj be the projection operator on Wj. Jgpq 
is equivalent to 



J'spq{f) = \\P3f\\p + 



j>0 



A more complete presentation of wavelets linked with Sobolev and Besov approximation 
theorems and statistical applications can be found in the book from Hardle et al. (1998). 
General references about Besov spaces are Peetre (1975), Bergh &; Lofstrom (1976), Triebel 
(1992), DeVore & Lorentz (1993). 



1.2 Estimating the factorization measure /(/^ - fXf 

We first recall the definition of the wavelet contrast already introduced in Barbedor(2005). 

Let / and g be two functions on M'' and let $ be the scaling function of a multiresolution 
analysis of L2{Mf) for which projections of / and g exist. 

Define the approximate loss function 

C|(/-5)= E (/(/-5)$,fc)' = ll^^,(/-5)ll^. 

It is clear that f = g implies C? = and that (^1 = implies Pjf = Pjg almost surely. 
Let / be a density function on M'' ; denote by f*^ the marginal distribution in dimension £ 

x^^ [ f{x\..,x'^)dxK..dx^-'^dx^+K..dx'^ 
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and denote by /* the product of marginals f*^ . . . f*'^. The functions /, /* and the admit 
a wavelet expansion on a compactly supported basis {ip,tp). Consider the projections up to 
order j, that is to say the projections of /, /* and on Vf and Vj, namely 

Pjf* = E ^Mn'^j"^ Pjf = E ^M^j" and pjr' = e cyjk{r')v^jk, 

with ajkU*^) = I f*^'Pjk and ajk{f) = J f^jk- At least for compactly supported densities 
and compactly supported wavelets, it is clear that Pjf* = Pjf*^ ■ ■ ■ Pff*^. 

Proposition 1.1 (ICA wavelet contrast) 

Let f be a compactly supported density function on M.'^ and let (p be the scaling function of a compactly 
supported wavelet. 

Define the wavelet ICA contrast as Cj{f — f*). Then, 

f factorizes Cf (/ - /*) = 

C|(/-r)=0 =^ Pjf = Pjf*K..Pjf*'' a.s. 

Proof / = /!.../<' ^r^ = /^ £=l,...rf. □ 



Wavelet contrast and quadratic functional 

Let / = // be a density defined on R'^ whose components are independent, that is to 
say / is equal to the product of its marginals. Let /a be the mixed density given by 
fA{x) = \dct A^^\f{A^^x), with A a dx d invertible matrix. Let be the product of the 
marginals of /a- Note that when A = I, f\ = f^ = fi = f . 

By definition of a Besov space Bspg(M'*) with a r-regular wavelet tp, r > s, 

f eB,p,iR')^\\f-P,f\\p = 2-^^e„ {e,}G£g{N^). (3) 

So, from the decomposition 

WfA - flWl = J PjifA - flf + J[fA-fl- PjifA - fl)T, 

= c|(/a -fD + J [fA -fl- PAfA - fl)] \ 

resulting from the orthogonality of Vj and Wj, and assuming that /a and f\ belong to 

< II/a - flWl - C^Ua - fl) < C2-2^^ (4) 

which gives an illustration of the shrinking (with j) distance between the wavelet contrast 
and the always bigger squared L2 norm of /a — fX representing the exact factorization 
measure. A side effect of (4) is that C'j{fA - fX) = is implied by A = /. 
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Estimators under consideration 

Let S be the latent random variable with density /. 

Define the experiment S" = (A:"®", (Xi,...,X„), Pf^, Ja e Bspq), where Xi,...,X„ is 
an iid sample of X = AS, and PJ^ = P/^ Pf^^ is the joint distribution of {Xi 

Define the coordinates estimators 

n 1 ^ 

"ifc = Q;jfci...,fe<i = - X "^i*:' (^i^) • • ■ ^j'^" ^^i^ ^^'^ "jfe^ ^ ~ X 

i=l i=l 

where is coordinate £ of X e K"^. Define also the shortcut Xjk = &jk'^ ■ ■ ■ aji-d. 
Define the full sample plug-in estimator 

(fci,...,fc<^)6Z'' fceZ"^ 
and the full sample U-statistic estimator 

(7) 

where 7™ is the set of indices . . . G N, 1 < < n, i^^ i- i^^ if 7^ £2} and 

Am nl I Tm\ 

— (n-m)\ ~ I"*" I" 

Define also the U-statistic estimators 

B]i{X„ • • • ,Xn}) = E 4 E ^AXi^)^Jk{Xi2) 

k " 

(8) 

B|({X^ . . . = E ^42 E ¥',fc4^f0^ifc^(^f0- 

k" " 



Notational remark 

Unless otherwise stated, superscripts designate coordinates of multi-dimensional entities 
while subscripts designate unrelated entities of the same set without reference to multi- 
dimensional unpacking. For instance, an index k belonging to Z"^ is also written k = 
{k^,. . . , k"^), with k^ G Z. Likewise a multi-index i is written z = (i^, . . . , z™) when belonging 
to some = = e N, 1 < < n} or 7,7 = G ^ I2 ^ 7^ J^"}, 

for some m > 1 ; but ii, 12 would designate two different elements of 7™, so for instance 
[Er=i Efeez-* *jfe(^i)]^ is written ,^ $,fc,(XiJ$,fe,(XiJ. Finally is coordinate i 

of observation X and X refers to a sample {Xi, . . .X„}. 



As was said in the introduction and as is shown in proposition 1.6, the estimator 7)| 
computed on the full sample is slightly suboptimal. We now review some possibilities to 
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split the sample so that various alternatives to f)^ on the full sample could be computed in 
an attempt to regain optimality through block independence. 

We need not consider Cj on independent sub samples because, as will be seen, the order 
of its risk upper bound is given by the order of the component d^^, — a^j. which is not 
improved by splitting the sample (contrary to Y^k '^k ^ ^% J2k^3k^3k - oijkXjk)- The 
rate of (7? is unchanged compared to what appeared in Barbedor (2005). 

Sample split 

■ Split the full sample {Xi, . . . ,Xn} in d + 1 disjoint sub samples ^,R^,...R'^ where the 
sample refers to a plain section of the full sample, {Xi, . . . , X^n/d+i]} say, and the 
samples R^,...,R'^ refer to dimension i of their section of the full sample, for instance 

{■^[n/d+l\i+V • ■ • ' ^[n/d+l](^+l)}- 

Estimate each plug-in ajk{R^) and a^y. (R^), and the U-statistics Bj{R^), B'j{W), £ ^ 1, . . . ,d 
on each independent sub-sample. This leads to the definition of the d + 1 samples mixed 
plug-in estimator 

d 

F]{R^, R\..., R'') = B^iR'') + H B^{R') - 2 ^ a,fc(^°)d,fci {R') . . . aj^^R^ (9) 

to estimate the quantity J^k "jfc + 11^=1 (Efe^gz - 2 Efe ctjkajk^ . . . ajkd = Cj. 

Using estimators Bj places us in the exact replication of the ease Bj found in Kerkyacharian 

and Picard (1996) , except for an estimation taking place in dimension d in the case of ]3j{R^). 
The risk of this procedure is given by proposition 1.3. 

■ Using the full sample {Xi, . . . , X„} we can generate an identically distributed sample of f\, 
namely DS = t\iend{Xli . . .X'^^}, but is not constituted of independent observations when 

But then using a Hoeffding like decomposition, we can pick from DS. a sample of inde- 
pendent observations, IS = \Jk=i...[n/d]{Xjk-i)d+i - ■ ■ -^tA^ although it leads to a somewhat 
arbitrary omission of a large part of the information available. Nevertheless we can assume 
that we dispose of two independent, identically distributed samples, one for Ja labelled R 
and one for f\ labelled S, with R independent of S. In this setting we define the mixed 
plug-in estimator 

G]{R, S) = Bj{R) + Bj{S) - 2 ^ ajk{R)ajk{S) (10) 

keZ'' 

and the two samples U-statistic estimator 

A]{R,s) = ^ E E [^^^=(^^0 - ^jkis,^)] [<^>MR^^) - '^Ms^^)] (n) 

assuming for simplification that both samples have same size n (that would be different 
from the size of the original sample). A^(i?, S) is the exact replication (except for dimension 
d instead of 1) of the optimal estimator of /(/ - ff)^ for unrelated / and g found in Butucea 
and Tribouley (2006). The risk of this optimal procedure is found in proposition 1.4. 
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Bias vEiriance trade-off 



Let an estimator Tj be used in estimating the quadratic functional Ki, = J [Ja — fX)"^ ', using 
(4), an upper bound for the mean squared error of this procedure when Ja G -Bs2oo(R'^) is 
given by 

(T,- - < 2E]^ {fj - C]f + C2-^i\ (12) 
which shows that the key estimation is that of the wavelet contrast Cj{fA — fX) by the 
estimator Tj. Once an upper bound of the risk of Tj in estimating C| is known, balancing 
the order of the bound with the squared bias 2'^^'"^ gives the optimal resolution j. This is a 
standard procedure in nonparametric estimation. 

Before diving into the computation of risk bounds, we give a summary of the different 
convergence rates in proposition 1.2 below. The estimators based on splits of the full sample 
are optimal. Dj is almost parametric on {2^"^ < n} and is otherwise optimal. 

Proposition 1.2 (Minimal risk resolution in the class -Bs2oo and convergence rates) 

Assume that f belongs to Bs2oo{'R'^), and that projection is based on a r-regular wavelet ip, r > s. 
Convergence rates for the estimators defined at the beginning of this section are the following : 



Convergence rates 


statistic 


2''^ < n 


2^'^ > n 


k]{R,S), G%R,S), Ff{R°,R\...,R'') 


parametric 

-4s 


-8s 
flis + d 

-8s 
flis + d 

inoperable 



Table 7. Convergence rates at optimal J* 



The minimal risk resolution satisfies, 2^*'^ w (<)n for parametric cases ; 2-'*'* f« n """^T*? for Dj, 
k], G] or Ff when s < ^ and 2^*'^ « n^ru for C]. 

Besov assumption about / transfers to /a (see Barbedor, 2005). Using 

E%{Hj - K,f < 2E%{Hj - C]f + C2-^^\ 
and balancing bias 2~^^^ and variance of the estimator Hj, yields the optimal resolution j. 

■ from proposition 1.5, for estimator Cj{X), the bound is inoperable on {2-''' > n}. Otherwise 
equating 2^^n~^ with 2~*^* yields 2^ = n^rn and a rate in n^^. 

■ from proposition 1.4 and 1.3, for estimators F^{R°,R^, . . . ,R'^), Ff{R,S) and D]{R,S) , on 

{2^'^ > n} equating 2^'^n~'^ with 2"''^* yields 2^ = n^A^ and a rate in ; on {2^'^ < n} the 

rate is parametric. Moreover 2^"^ < n implies that s > d/4 and 2-''^ > n implies that s < d/A. 

■ from proposition 1.6, for estimator D'j{X) on {2^'' > n} equating 2^'^n~'^ with 2~^^* yields 
2i = n^Ts and a rate in n~'^ ; on {2^"^ < n} the rate is found by equating 2^n~^ with 
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□ 



1.3 Risk upper bounds in estimating the wavelet contrast 

In the forthcoming hnes, we make the assumption that both the density and the wavelet 
are compactly supported so that all sums in k are finite. For simplicity we further suppose 
the density support to be the hypercube, so that J^keZ"^ ~ 



Proposition 1.3 (Risk upper bound, d+1 independent samples — f^, fX^, ■ ■ ■ , /^'^) 

Let {Xi, . . . , Xn} be an independent, identically distributed sample of Ja- Let , . . . , be a,n 
independent, identically distributed sample of , £ = 1, . . . ,d. Assume that f is compactly supported 
and that (p is a Daubechies D2N. Assume that the d+1 samples are independent. Let E^^ be the 
expectation relative to the joint distribution of the d+1 samples. Then on {2-''^ < n^}, 

E]^ {f^{X, R\..., R'^) - < Cn-^ + C2^'^n-^ I [V^ > n] . 



For the U-statistic Fj{X,R},...,R^), with ajk = ajk{X), ajke = ajkt{Rf-) and \jk = 
Oiji-i . . . a.jf.d , 



{F^ - C^f < 3 



k e I k^ k k 

On {2^'^ < n^}, by proposition 1.9 for the term on the left, proposition 1.10 for the 
middle term, and proposition 1.11 for the term on the right, the quantity is bounded 
by Cn-i +C2^''n-2. 
□ 



Proposition 1.4 (Risk upper bound, 2 independent samples — /^) 

Let X = {Xi, . . . ,Xn} be an independent, identically distributed sample of X with density fA- Let 
R = {Ri, . . . ,Rn\ be an independent, identically distributed sample of R with density f\. Assume 

that f is compactly supported and that (p is a Daubechies D2N . Assume that the two samples are 
independent. Let E"^^ be the the expectation relative to the joint distribution of the two samples. 

Then 

E]^ (g|(X, R) - cf^ ^ < Cn-^ + CV^n-"^ I {V^ > n} 

E]^ (a|(X,^) - Cf^ < C*n-^ + C2^^n-'^. 
with C* = at independence. 



For the estimator G'j{X,R) the proof is identical to the proof of proposition 1.3, the 
only diff'erence being that \jk and \jk no more designate a product of d one dimensional 
coordinates but full fledged d dimensional coordinate equivalent to ajk and ajk- 
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The only new quantity to compute is then EJ^(j2k^jk{^)^jk{R) — Y.k^jk'^jk^ , coming 
from the crossed term. 

Let Q = E]^{Y,kajk{X)\jk{R)f ■ Let 6 = Y.,,ajk\jk- Recall that f2™ = . . . e 
N, 1 < < n). 

Let I be the set of distinct coordinates of i e So that, estimators being plug-in, with a 
sum on O^, with cardinality n^, 



ieo^ fei,fe2 



1 



+ 



|?|=4 |i|=3 fe 



+ ^ (4iV - 3)'' E]^^{Xf^{Rf 

|i|<2 fe 

with lines 2 and 3 expressing all possible matches between the coordinates of i, and using 
lemma 1.7 to reduce double sums in ki,k2- 

By independence of the samples, using lemma 1.8 and the fact that \{i € fi^: | « | = c}| = 0{n'') 
given by lemma 1.2, 

Q < ^ c-n-i + C E ^'fc + C E + <^""'2^'''- 

A- fc 

with = n!/(n - p)!. So that, with vl^"^ = 1 - | + Cn'^, 

The rate is thus unchanged for P"^ compared to the d+1 sample case in previous proposition. 
Case h]{X,R) 

Recall that = . . e N, 1 < < n, i'^^ ^ i^^ if h ^ £2}. 

For iell, let hjk{i) = [^jk{Xii) - ^jk{Rii)] [$jfe(Xi2) - ^jk{Ri^)] and let 6 = Cj ; so that 

^ ilMkiM 

_ / #{il,»2:|iini2| =0} .\n2._}_ V- \^ jpn (■ -sf, (■ X 

-[ 7j2)2 V "^M2)^ ^ 2^ Ef^hjk,{ii)njk2{i2), 

and by lemma 1.3 the quantity in parenthesis on the left is of the order of Cn~^. 
Label Q{h,i2) the quantity E^j^ Efei,fe2 ^jkAh)hjk2{i2)- Let also 5jk = ajk - Xjk- 
So that with only one matching coordinate between ii and i2, 

Q{H,i2) n i2\ = 1} = E^;^ Yl ^i^^^i^^^ + 

fel,fe2 



k k 
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Again by lemma 1.7 and lemma 1.8, for X or R 

fei,fe2 k k 

and since all other terms are bounded by a constant not depending on j, by lemma 1.3 

(^n)-'Ei„., Q(ii,i2)i{ |ii ni2| = 1} < Cn-\ 

Likewise, the maximum order of Q{ii,i2)^{\h ni2| = 2} is ^k[Ef^^jk{X)'^]'^, and the corre- 
sponding bound is 2^^n~'^. 
□ 



Proposition 1.5 (Full sample Cj risk upper bound) 

Let X = Xi,...,Xn be an independent, identically distributed sample of fA- Assume that f is 
compactly supported and that ip is a Daubechies D2N . Let E'^^ be the the expectation relative to 

the joint distribution of the sample X. Let (7? be the plug-in estimator defined in (6), Then on 

E]^ (C]{X) - < C2^'^n-^ 



k k k 

By proposition 1.7 the first term is of the order of 2^'^n~^. By proposition 1.8 the two other 

terms are of the order of Cn^^ + 2^n~^ I {2^'^ < n^}. 

□ 

As is now shown, the rate of Dj{X) computed on the full sample is slower than the one for 
A'j{R,S) in the two samples setting. 

The reason is that we cannot always apply lemma 1.7 allowing to reduce double sums in 

fci, k2 to a sum on the diagonal ki = k2 for translates of the same (p functions. Indeed, when 
a match between multi indices h and i2 involves terms corresponding to margins, it is not 
guaranteed that a match on observation numbers also corresponds to a match on margin 
numbers ; that is to say, in the product (p{X^^ — ki)ip{X^^ — k2), only once in a while ii = £2', 
so most of the time we can say nothing about the support of the product, and the sum 
spans many more terms, hence the additional factor 2^ in the risk bound for Dj on the full 
sample. 
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Proposition 1.6 (Risk upper bound, full sample — Ja) 

Let Xi, . . . , Xn be an independent, identically distributed sample of /a- Assume that f is compactly 
supported and that (p is a Daubechies D2N. Let Dj be the U-statistic estimator defined in (7), Then 

with 5jk the coordinate of fA — fA o-f^d C* = at independence, when fA = fX- 



JA 



(13) 



To make Dj{X) look more like the usual U-estimator of /(/ - g)"^ for unrelated / and g, 
we define for i e , the dummy slice variables Yi = X^i , Vi = (Xj2 , . . . X^d+i ), Zi = , 
Ti = (X^d+s , . . . X^2d+2 ) ; so that Yi and Zi have distribution Pf^ , Vi and Ti have distribution 
P/* = Pf*i . . . Pj-*d (once canonically projected), and Yi, Vi, Zi, Ti are independent variables 

under P^^. Next, for /c g Z'', define the function Ajk as 

Ajkix^i, . . . , x^d) = tpjki (xli) . . . tpjkd{xfd) Vi G nf, 
A,fe {X, ) = (X, ) = ) . . . ^j,d (xf) yi€nl = {i...,n} 

with second line taken as a convention. 

So that D'j{X) can be written under the more friendly form 

with 7™ = {(zS . . . , i™): €N,l<i^ <n, i^^ ^ i^^ if h £2}. 

Following the friendly notation, let hik = [Ajk{Yi) - Ajk{Vi)] [Ajk{Zi) - Ajk{Ti)] be the kernel 
of D]{X) at fixed k. Then, 

[D^{x)r = \i'n'^r' E E ^n^^.^^.'^- 



Consider the partitioning sets Mc = {ii,i2 e /^'*+^ x 1^''+^: |n n = c}, c = . . . , 2d + 2, that 
is to say the set of pairs with c coordinates in common. Equivalently, Mc can be defined as 
the set {ii, 12 G X 72^+2. \i^ui2\ = M + A- c}. 

According to the partitioning, with hi = J^k ' 

2d+2 

El[D]{X)r = \ll''+r'Y: E E]^hi,hi^. 

c=0 (ii,i2)eMc 

Let Xjk — ajfci ■ • ■ ctjk'i and Sjk = ajk — Xjk- 
■ On Mo, with no match, 

E]^hi,hi,I{Mo} = EK-fei 
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By lemma 1.3, the ratio |Mo|/|/^''+^| is lower than 1 + Cn ^. So that 
On Ml, assuming the match involves Yi^ and Yi^, 

ki ,k2 



fcl,fc2 



(14) 



Vfe / k \ k J \ k J 



withc| = Efe'52,. 



Next by (17) in lemma 1.7 for the first line, the double sum in k under expectation is 
bounded by a constant times the sum restricted to the diagonal k\ = ki because of the 
limited overlapping of translates ^p^k ; using also lemma 1.8, 



Since all other terms in (14) are clearly bounded by a constant not depending on j, we 
conclude by symmetry that Ef^h^^ht^llMi} < C for any match of cardinality 1 between 
narrow slices {Yi^ Yi^ or Z^^ Zi^ or Y^^ Zi^ or Zi^ FjJ. Moreover C = when Ja = fX at 
independence, because of the omnipresence of Sjk, the coordinate of /a - /a- 

On Ml, if the match is between Yi^ and Vi^, a calculus as in (14) yields, 

E]^hi,hi, I {Ml} = - X Sjk^Sjk^E^^^jkr {Yi, )Kjk2 {Vi,) + ajkSjk + I ^ XjkSjk J ; 

ki,k2 k \ k / 

which can also be found from line 2 of (14) using the swap ^jkiYi^) < — > -AjkiVi^) and 

Ctjk < > —Xjk- 

Next, for some £ G {1, . . . ,d}, 

^ SjkJjk2Ef,^jkAYn)^jk2{Vi2) = SjkJjk2Xlt'^E]^^jk,{X)^jki{X') 

ki,k2 fei,fc2 

with special notation A^^^ = a^^i . . . a^l^ for some Pi, < < r, EjLiPi = r. 

In the present case <^ jk^{X)Lp jf.f^{X'^) = ^jkAX)^jkf^{X^)'^{\k{ -k2\<2N -I] does not give 
any useful restriction of the double sum because the coefficient Ujk hidden in 5jk is not 
guaranteed to factorize under any split of dimension unless A — I ] and lemma 1.7 is 
useless. This is a difficulty that did not raise in propositions 1.3 and 1.4 because we could 
use the fact that these kind of terms were estimated over independent samples. 

Instead write E]^\^jk^{X)ipjk2{X'^)\ < '^^M\ooE]^\^jk^{X)\ < (725 2"^ using lemma 1.8. So 
that when multiplied by Ylik ^k ^3k^'jk~^\ using Meyer's lemma, the final order is 2^ . 
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By symmetry, for any match of cardinality 1 between a narrow and a wide slice (Y or T or 
equivalent pairing), Ej^lhi^hi^ll {Mi} < C2^ , with C = at independence. 

On Ml, if the match is between Vi^ and Vi^, by symmetry with (14) or using the swap 
defined above, 

kiM k \ k / \ k / 

and for some not necessarily matching £1,^2 € {1, . . . , d} {i.e. lemma 1.7 not applicable), 

fel,fe2 fel,fe2 

k 

with last line using Meyer's lemma, and having reduced the term under expectation to a 
constant by Cauchy-Schwarz inequality and lemma 1.8. 

And we conclude again that, for any match of cardinality 1 between two wide slices {V or 
T or equivalent), E'^^hi^hi^liMi} < C2^, with C = at independence. 

By lemma 1.3, the ratio |Mi|/|/^'^+^ x /^''+^| « so in summary, the bound for Mi has 
the order C*2^n~'^, with C* = at independence. 

On Mc, c = 2...2rf + 2. 

Fix the pair of indexes (ii, ^2) £ 1?^^^ x In"^^^, we need to bound a term having the form 
Q{h,i2) = E]^ ^ A,k{Rn)Ajk{Si,)AjMj^jk.{SlJ 

fel ,fc2 

where both slices Ri^ ^ Si^ unrelated with both slices i?-^ 7^ S'^^ are chosen among any of 
the dummy Y, V, Z, T. 

Narrow slices only. For a match spanning four narrow slices exclusively, that is to say 
(Fji =1^^2)0 (Zjj = Zi^) or (Fj^ = ^12) n (Zjj = lia)) ^ case possible on M2 only, the general 
term of higher order is written Efei,fe2 ^u^'j^^ (^)*jfe2 (^)-S7^^'jfei (^)^'jfe2 1^). By lemma 1.7 



this is again lower than (4n - 2,)'^ '^^k 



E}\^jk{Xf , that is C2^'^. By lemma 1.3, this case 



thus contributes to the general bound up to C2^'^n ^. 

Three narrow slices only is not possible and two narrow slices correspond to the case Mi 
treated above. 

Wide slices only. For a match spanning wide slices on Mc, c = 2, ... 2d, a general term with 
higher order is written J^k^M ^fA^jkiiVn)Ajk,{Ti,)Ajk^{Vi^)Ajk2iTi^), with \ii ni2\ = c, (an 
equivalent is obtained by swapping one V with a T ). Since the slices are wide, it is not 
possible to distribute expectation any further right now : if Vi^ is always independent of Tj^ , 
both terms may depend on Vi^ , say. Also matching coordinates on ii , i2 do not necessarily 
correspond to matching dimensions of the observation, and then lemma 1.7 is not 
applicable. Instead write, 



Q{h,i2)=Y: A]fer^^Aj,^^^>iJ74A<,^>(y.,,T.jA£(y.2,T.2) 

ki ,k2 
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with h!^l\Vi,Ti) a product of c independent terms of the form ipj^.e{X^) spanning at least 
one of the slices Vi, Ti. 

By definition of ii and i2, the product of 2c terms under expectation can be split into c 
independent products of two terms. So, using E'f^\ipjf^e{XY\ < C on each bi-term, the order 

at the end is Ci^,^ •^j^'^ '^^)^ 5 using Meyer's lemma, the bound is then of the order of 

Finally, using lemma 1.3 as above, the contribution of this kind of term to the general 
bound is Y^lti '^^"n'". 

On {2^ <n}D {2^'^ < n^} D {2^'^ < n}, this quantity is bounded by C2^n-i < C2^'^n-'^ and 
on {2^ > n} it is unbounded. 

Narrow and wide slices Reusing the general pattern above, with < 2d matching coordinates 
on wide slices and c^. < 2 on narrow slices 

with A^j^\Yi,Vi,Zi,Ti) a product of c independent terms of the form ipj^e{X) or ^jk{X) 
spanning at least one of the slices Vi, Ti and one of the slices Yi, Zi. As above, the bracket 
is a product of independent bi-terms, each under expectation bounded by some constant 
C, by lemma 1.8, using Cauchy-schwarz inequality if needed. So this is bounded by 

fci,fc2 k 

using Cauchy-Schwarz inequality and Meyer's lemma this is bounded by 22('="'~'^)2^('^''~^) 
and, with lemma 1.3, the contribution to the general bound on {2^ < n^} D {2^'^ < n^} is 

2 2d 



2-^''^^2¥n-''2^n-«l{2^- < n^} < 



a=l 6=1 



Finally on {2^^^ < n^}, Ef^B] - (j2k 5%)^ < C*2in-'^ + 2^'^n-'^. □ 



Implementation issues 

The statistic is a plug-in estimator ; its evaluation uses in the first place the complete 
estimation of the density Ja and margins ; which takes a computing time of the order 
of 0{n{2N - 1)'^) where is the order of the Daubechies wavelet, and n the number of 
observations. 

In the second place, the actual contrast is a simple function of the 2^'' + c?2-' coefficients that 
estimate density Ja and its margins ; the additional computing time is then in 0{2^'^). 



18 



One can sec here the main numerical drawback of the wavelet contrast in its total 
formulation — to be of exponential complexity in dimension d of the problem ; but this is by 
definition the cost of a condition that guarantees mutual independence of the components 
in full generality : d sets Bi,. . . ,Bd are mutually independent if P(Bin. . .ni?d) = PBi . . . PB^ 
for each of the 2"^ choices of indices in {1, . . . , d}. 

Complexity in jd drops down to 0(d^2^-' ) if one concentrates on a pairwise independence, like 
in kernel ICA and related methods, and in the minimum marginal entropy type method 
of Miller and Fisher III (2003). Pairwise independence is in fact equivalent to mutual 
independence in the no noise ICA model and with at most one Gaussian component 
(Comon, 1994). The minimization used by 

The pairwise algorithm used by Miller et Fisher (2003) consists in searching for the 
minimum in each of the free plans of M'', applying Jacobi rotations to select a particular 
plan. A search in each plan is equivalent to the case d — 2, where the problem is to find the 
minimum in 61 of a function on M, for 6 <E [0,7r/2]. To do so, the simplest could be to try out 
all points from to n/2 along a grid, or to use bisection type methods. 

U-statistic estimators of Cj have complexity at minimum in 0{v?{2N — l)^''), that is to say 
quadratic in n as the method of Tsybakov and Samarov (2002) which also attains parametric 
rate of convergence ; on the other hand the complexity in jd is probably lowered since the 
contrast can be computed by accumulation, without it be necessary to keep all projection 
in memory, but only a window whose width depends upon the length of the Daubechies 
filter. 



1.4 Appendix 1 — Propositions 

Proposition 1.7 (2nd moment of q:|^. about X^feQ;|^ ) 

Let Xi, . . . , Xn be an independent, identically distributed sample of f , a compactly supported function 
defined on W^. Assume that ip is a Daubechies D2N . Let ajk = -J2i=i^jk^{^i) ■ ■ ■Vjk''{^i)> 




For the mean, using lemma 1.8, 





•d 



For the second moment, let = {ii,i2,«3,4 €{!,•• • ,»^}: U . . . U {^4}! = c}. 



EUT.^%f = -AY. E i^/. E *^'=i(^H)^.fei(^ij^ife.(^i3)^.fe(^i4)i{Mj 
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On c = 1, the kernel is equal to Y.k^M ^jkiiXf^jhAXf < (47V - 3)'' X^fe by lemma 

1.7. And by lemma 1.8, E]^ J2k '^jkl^V < J2k '^'^^'^ = C2^^''*. 

On c = 2, the kernel takes three generic forms : (a) ^^.^ ^.^ <^ jk^{X)<^ jk^{Y)<^ jk^iX)'^ jk^O^) or 
(b) Efc^fc, ^jkAXf^jkAY? or (c) Y.k,M ^jkAX)^jkAy%kAYf. In cases (a) and (c), using 
lemma 1.7, the double sum can be reduced to the diagonal ki = k2- So using also lemma 
1.8, 

(a) E]^ I (y)$,.fe, (F) I < (4Ar - if E ^^''^C^)'^.-'^ (^)' < C2^' 

fcl ,fc2 

(c) E '^,kAx)^3kAy)'^3kAyf\ < E],m - irj2\^Ax)^AYf\ < C2^''- 



fel,fe2 



On c = 3 the only representative form is 



ki^k2 k k 

\2 



and on c = 4 the statistic is unbiased equal to (J2k °^'jkf under expectation. 

Next, since IM4I = and, using lemma 1.2, \Mc\ = 0{n'^), 

Eu (E ^ ^nn-^(E + + + ""'2^' 

fe k 

< (E a|fe)^ + + Cn-^2^^I{2^^ < n} + Cn-^2'^^^l{2^'' > n} 

k 

with A^n-4 = l-| + Cn-2. 
Finally 

(E ^k - E "f^)' = E'h (E -?^)' + (E - 2i?}^ E E 



Proposition 1.8 (2nd moment of J2k ^% about J2k ^% ^jkCtjk about J2k ^jkCtjk 

) 

LetXi, . . . ,Xn he an independent, identically distributed sample of f , a compactly supported function 
defined on M"^. Assume that ip is a Daubechies D2N . Let Xjk = ^ X]"=i fjk'^ {X}) . . . X]"=i V'jk'^ {Xf), 

Then on {2^'^ < n^} 

(E ^^•'^"^•^ - E ^.'^"ifc)' ^ + 

fe k ' " 

k k 

20 



For I e O^"*, let be the slice (X^?i,X^^2, • • • ,-'ff2d-i,-''^f2d)- Let the coordinate-wise kernel 
function A^-fc be given by KjkiYi) = ipjk^{Xl^)ipjki{Xl2) ■ . .iPjk'ii^t2d-i)Vjk''iXf2d)- 

Let \i\ be the shortcut notation for U . . . U Let W^'^ = {i G O^d. |^| < 2d}, that is to 

say the set of indices with at least one repeated coordinate. 

Then the mean term is written 

Vl^2d k k 

Let Mc = {i € fln'^: \i\ = c} be the set indices with c common coordinates. So that Qi is 
written 

2d-l 

c=l Mc fc fe 

By lemma 1.4 with lemma parameters {d = l,m = 2d,r = 1), E]^ \AjkiVi)\l {M^ < C2i(2d-2c) 
and by lemma 1.2, |Mc| = 0{n^). Hence, 

2d-l 2d~l (2d-c) 



za— 1 za— 1 /o7\ 

Oyfc < E n""'+"t^2^'^''""^ = S-^'' E ^ ( 7^ ) 

c=l c=l ^ ^ 



which on {2^'^ < n} has maximum order 2^^^ "^^n ^ when d — c is minimum i.e. c = 2d — 1. 
Finally |Qi| < Efe C'2^'^i-'^)n-i < C2jn-\ 

Next, the second moment about zero is written 

(E ^'fe) ' = E E ^i-^i (^^1 (^^^ ) 

fe il,i2e(n2<i)2 fcl,fc2 

= Q2 + Atfrr^'^e'' 

with W^'' = {ii,Z2 G (^^^'')^: |«i U < 4(i}, that is to say the set of indices with at least one 
repeated coordinate somewhere. 

Let this time = {11,12 e {^^^■- \ii U «2| = c} be the set indices with overall c common 
coordinates in ii and 12- So that Q2 is written 

C=l Mc fel,fe2 fel,fe2 
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By lemma 1.6, unless c = 1, it is always possible to find indices ii,i2 with no match between 
the observations falling under ki and those falling under k2, so that there is no way to 
reduce the double sum in fci,A:2 to a sum on the diagonal using lemma 1.7. Note that if 
c= 1, E]^Ajk{Vi)Ajk{Vi) = E]^<^jk{X)^ has order 02^^^. 

So coping with the double sum, by lemma 1.4 with lemma parameters {d = l,m = 2d,r = 2), 

E]JAjk{V,,)Ajk{V,,)\ < C2i(4<i-2c)^ and again by lemma 1.2 |M,| = OK), so EfJQ2,,k,j,k,\ < 
Yf^i' 'n'^~'^'^C2i^'^'^^'^''\ which on {2^''- < n} has maximum order 2^^^~^'^^n~^ when c = 4d — 1. 
Finally, E]^Q2 < T.k,M ^2^^^-'^'^^ n'^ < C2in-^. 

Putting all together, and since A^n-P = 1 - + 0(n-2), 

k 

= Q2- 20Qr + e^{l + A^^n-^" - 2Al^n-^'') < IQ2I + 2^|Qi| + 0(n-') 
< C2%-i 



For the cross product, 

As above, for i e ^f,^^, let Vi be the slice (Xjo,Xj?i, . . .,X^^). Let the coordinate- wise kernel 
function A^fc be given by Ajk{Vi) = *jfc(Xio)V'jfci (X/i) . . .ipj^.ci{Xf^). Let = J^k'^jk^jk- 

Let W^+^ = {i e ^t^^: \i\ < d + 1}, that is to say the set of indices with at least one repeated 
coordinate. 

So that, E]^Y.k^3khk = Qi + Ai+^n-''-^ with Qi = n-''~^Y.w^+^T.kE]^^^jk{Vi) and 
likewise 

El. (E "i^^i^) ' = Q^ + Af+^n-^'^-H^ 

k 

with Q2 = n~'^'^~'^ J2yy2d+2 X^fei.fe Ef^AjkiiVij^)Ajk2{Vi2)- And we obtain in the same way, 
^fA (E "^^kXjk - ajkXjky < IQ2I + 2^|gi| + 0(n-2) 

k 

Let Mc = {i e ^t^^'- \i\ = c} be the set indices with c common coordinates. So that Qi is 
written 

d 

Qi = "-"^-1 Yl ^ E E EfAMVi) = E Q^i" 

c=l Mc fe k 

By lemma 1.4 with lemma parameters {rrid = l,ini = d,r = 1), 

E]JAjk{Vi)\I{M^} < C2^{^-'i'^d)2W-'^<^^) 
with ci + Cd = c, < ci < 1 < Cd < 1 and by lemma 1.2, |Mc| = 0(71"=). Hence, 

d / „j X (d+l-c) 

C=:l C=l V ^ / 
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which on {2^'^ < n} has maximum order C2^^^ '^^n ^ when d + 1 — c is minimum i.e. c = d. 
Finally |Qi| < Efc C2J^i-'^)n-i < C2%"i. 



Next, as above Q2 = J2ki fca 'Ssjfci jfc2 1 ^i^d again by lemma 1.6, unless c = 1, it is always 
possible to find indices ii,i2 with no matching coordinates corresponding also to matching 
dimension number, so that there is no way to reduce the double sum in fci , A;2 to a sum on 
the diagonal using lemma 1.7. 

So coping once more with the double sum, by lemma 1.4 with lemma parameters (m^ = 
l,mi = d,r = 2), EfJAjk{Vi,)Ajk{Vi,)\ < 02^^ (2-20,) 2^24-20,) ^ ^j^h ci + q = c, 1 < q < 2, 
< ci < 2d, and again by lemma 1.2 \Mc\ = 0{n'^), so 

2d+l 2d+l ^ s (2d+2-c) 

E]^ \Q2nW. I < n'=-2'^-2C2^('^-<«-<^+<«--i) = 2^-(-2+(i-d)c.) Y^ci-) 

c=l c=l ^ ^ 

which on \2^'^ < n} has maximum order C2~^^n^^ when c = 2d+l. Then either = 1, which 
means that the two terms ^jki{Xij^)^jk2{Xi^) match on the observation number, in which 
case the sum in fci, can be reduced ; either cd = 2. In the first case the order is E^^Qi < 
[AN -2,YY,^C2-^'^n-^ < Cn,-'^ and in the second case Ey^Q2 < T^k^M C2^~'^^'^n-^ < C2^n-'^. 
□ 



Proposition 1.9 (Variance of B?) 

Let {X\, . . . , Xn} be an i.i.d. sample with density f. Assume that f is compactly supported and that 
if is a Daubechies D2N. 

Let B'j = Y^f, -j^ J2iei^ ^jk{Xii)^jk{Xi2) be the U-statistic estimator ofY,k 
Then on {2^'^ < n^}, 

Eu{^-Y<-%)'^^^-' + ^"'^-" 

k 



Write that, 

E]^ ' = n-\n - J2 E ^i*^! i^iO^Jki {Xii)^Jk, {X,2) 

On M4 = {ii,i2 e In- \ii U i2\ = 4}, i.e. with no match between the two indices, the kernel 
hiihi^ = J2ki k2 ^jki{^ii)^jki{Xi2)^jk2{Xii)^jk2{Xi2) is unbiased, equal under expectation to 

On Mc, c = 2, 3, with at least one match between ii and i2 lemma 1.7 is applicable to reduce 

the double sum in fci, k2 and, 

Elhi,hi2l{M2UM3}= Y E '^jkAXii)<^jkAXq)<^jk2{Xii)<Pjk2{Xi2)l{M2lJMs} 

ii,i2&I^ ki,k2 

< Y (4A^-3)'^$^|$,-fe(X,i)$,fc(X,.)$,fe(X,i)$,fc(X,|)| 

M2,M3 k 

< ^ (7^2-''''(^~l'i'-''^l) = C Yj 2-'''"^^~I'i'^*='I), 

M2,M3 k M2,M3 
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using lemma 1.4 with parameter m = 2 and r = 2 for line 3. 

Next, by lemma 1.2, \Mc\ = 0{nF) and \Mi\ divided by {A^)'^ is more precisely equal to 
1 -4n-i + Cn-2. So that 

k c=2 fe 



Proposition 1.10 (Variance of multisample nX]fe^j(^^)) 

Let {Ri, . . . , Rn} be an i.i.d. sample of f*^, i= 1,. . . ,d. Assume that f is compactly supported and 
that ip is a Daubechies D2N . Assume that the d samples are independent. 

Let (i?^) = Efe :?j ^»e/^ ^jk iRii)^jk (^-2 ) be the U-statistic estimator of J^^. a^.^^ , £=l...d. 
Then on {2^'^ < n^}, 



e=i fei,...,fe<* 



Successive application of ab — cd= {a — c)b + {b — d)c leads to 

d 

ai...ad-bi...bd = ^{ae - be)bi . . . be-iae+i ...ad- (15) 



So applying (15), 



^ ^ Ajj. — X^f, — ^ ^ a'Hi.i . . . a^^ud — . . . 



2-2 2 2 

Jfti • • • "ifc-i ~ "ifei • • • "jfe"* 

fci...fe<i 



^=1 /s< /s<+i fed 



And 



(E % - ' ^ ^ E ^(E(-.v - "IfeO E • • • E ".'.^1 ' 

k 1=1 fe< fe<+i fed 



Label Q = i?;^EfeA|,-A^\)'. 
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If the d samples are independent, if 2-''^ < n^, and by proposition 1.9 with parameter d = 1, 

d-l 

□ 



Proposition 1.11 (Variance of multi sample X^fcajfcAjft) 

Let {Xi,...,X„} he an independent, identically distributed sam,ple of /a- Let {R{, . . . , R^} be an 
independent, identically distributed sample of f*^ , £ = 1, . . . ,d. Assume that f is compactly supported 
and that (p is a Daubechies D2N. Assume that the d+1 samples are independent. Let E^^ be the 
expectation relative to the joint samples. 

Then 

(J2 c^3k{X)Xjk{R\ ...R'-)-Y, o^jkXjk) ^ <Cn-H {2^ <n}+ CV'^n-'^-^ I {2^' > n} 

k k 



Let Q = E'^^ (^Z^feeZ'' '^ikXjk^ ; expanding the statistic, 

By independence of the samples, we only need to consider local constraints on the 
coordinates of z e f2^''+^. 

Let a be a subset of {0, 1, ... d}. Let Ja = e l^^d+a. ^ e « ^ j2£+i ^ ^21+2 . i^^^ z^f+i ^ 
^2£+2|^ It is clear that \ Ja\ = (n(n - l))'''''^~''''n'''' and that the JaS define a partition of ^^^"^ 
when a describes the 2**+^ subsets of {0, 1, . . . rf}. One can check that there are C^j^y distinct 
sets a such that \a\ = c, and that Y^^^q C'd+i«''("(" - 1))'^+^-^ = n''+^ Ec^o - 1)''+^"'^ = 

On the kernel is unbiased. On Ja, £ a, with the first two coordinates matching, the 
sum in fci, can be reduced to a sum on the diagonal by lemma 1.7. If ^ a, but some 
^ G a the sum can be reduced only on dimension k\ = fcj, but to no purpose as will be 
seen below. 

So Q is written Q = n-^'^-^ EaG-p({o,....d}) Ooa + Qia, with 
and 

ieJ^ojO^^a fci,fe2 



2^ 
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for some all distinct ii,---£\a\-i and h, ■ ■ - Id-iai+i whose union is {l,...d} and with Ci 
{AN - 3)''. The bound for Qoa is also written 



with special notation Aj^^ = a^^i...a^^^ for some integers pi,...,pd, < Pi < r with 
J2,i=iPi = T- And so, by Meyer's lemma this is also bounded by Z^jgj^ C2-'(l''l~^). 

For Qia with \a\ > 1, the sum in fci, ^2 could be split in k^^ . . . ,k'^^ . . . k'^'^"^'^^ where no 

concentration on the diagonal is ensured, and k^'^ . . . where lemma 1.7 is applicable, 

but precisely the multidimensional coefficient ajk = o:jki,,,kd is not guaranteed factorisable 
under any split, unless A = I. So we simply fall back to 



\a\-l 



This is also written, using Meyer's lemma at the end. 



ieJa,0^a k ieJa, O^a 

Finally, with J^ieJ 1 = l-^al given above, the general bound is written. 



Q <n 



-2d-2 



a#0 



and so 



d+l 



c=l 



< 1 {2^ <n}+ I {2-?' > n} 



1.5 Appendix 2 — Lemmas 

Lemma 1.1 (Property set) 

Let Ai,. . . ,Ar be r non empty subsets of a finite set ^. Let J be a subset of {1, ... ,r}. 

Define the property set Bj = {x E UAj-.x € Hji^jAj ; x ^ Uj^jcAj}, that is to say the set of 
elements belonging exclusively to the sets listed through J. Let bj = \Bj\ and b^ = J2\J\=K^J■ 
Then J2l=o E| j|=« -Bj = ^, and 

r r 

\Ai\ V ...\Ar\ < = |Ai U... A^l < \Ai\ + ...\Ar\ = 
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with equality for the right part only if = 0, k ^ 2 . . . ,r i.e. if all sets are disjoint, and equality for 
the left part if one set Ai contains all the others. 

It follows from the definition that no two different property sets intersect and that the 

union of property sets defines a partition of UAi, hence a partition of Vl with the addition 
of the missing complementary — [jAi denoted by i?0. The Bj are also the atoms of the 
Boolean algebra generated by {Ai, Ar, CI - UAi} with usual set operations. 

With B^, an overlapping of r sets defines a partition of Q with cardinality at most 2^ ; there 

are property sets satisfying \ J\ = k, with X]k=o = 2''- 

□ 

Lemma 1.2 (Many sets matching indices) 

Letmen, m> 1. Let be the set of indices . . . , t"*): i^ G N, 1 < i^ < n}. Let r G N, r > 1. 
Let I^ = {ie n^-.ii 7^ £2 ^ i^' ^ i^^}. 

For i = {i^,. . . , i"*) G f2™, let i = {i-'} c {1, . . . , n} he the set of distinct integers in i. 
Then, for some constant C depending on m, 

e {O.^Y,: I nU.-.UZr I =a| = 0(n")7{ V . . . V <a<mr} 

and in corollary . . . , ir) G {l^Y ■ | ii U . . . U ir | = a| = 0{n°^)I {m < a < mr}. 

In the setting introduced by lemma 1.1, building the compound (?i,...,v) while keeping 
track of matching indices is achieved by drawing b^^ = \h\ integers in the 2°-partition 6^ = 

{1, . . . ,n} thus constituting ii, then 2} + ^{2} = l*2| integers in the 2-^-partition ^0} 
thus constituting two subindexes from which to build 12, then ^ 3}+^{2 3}+^{i 3}+^{3} — I ^sl 
integers in the 2^-partition {6^.^ 2}, b'^^^y, b'^^^y, b"^} thus constituting 2^ subindexes from which 
to build i3, and so on, up to 6^^^ r} + • • •+^{r} ~ I integers in the cardinality 2'""^ partition 
{6^^^ ^_^y b^~^} thus constituting 2''"^ subindexes from which to build v. 

The number of ways to draw the subindexes composing the r indexes is then 

.''{1} 4^{1.2} ^''{2} 4^{l,...,r} /l^y /-.^X 

{1} "{l.-.-.r-l} "0 

with the nesting property 6^ = ''j^^ + ^ju{j+i} (provided J exists at step j) and A™ = ^^"^^ . 

At step the only property set with cardinality equivalent to n, is -Bp^^, while all others 
have cardinalities lower than m ; so picking integers inside these light property sets involve 
cardinalities at most in m! that go in the constants, while the pick in entails a 

cardinality A^^l = Aj^^^ ^-^_^| « n'U . 

Note that, at step j — 1, h-'^~^ = n — |ii U . . . U ij-i |, because, at step j, b-'^.y designates 
the number of integers in ij not matching any previous index ii, . . . ; so that also 
Ei=i ^{j} = I ?i U . . . U ; and incidentally J2j3j^ b'j = \ Ij^V 
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The number of integers picked from the big property set at each step is 

/(''{I} /I ''{2} A'>{r} 

"a "a "$ 

with 6^ = n - I zi U . . . U = n and X]j=i ^{j} = | «i U . . . U v|. 

For large n this is equivalent to n\nu...uir\_ 

Having drawn the subindexes, building the indexes effectively is a matter of iteratively 
intermixing two sets of a and b elements ; an operation equivalent to highlighting b cells in 
a line of a + 6 cells, which can be done in C^_,_j ways, with = A^/pl. 

Intermixing the subindexes thus involve cardinalities at most in m!, that go in the constant 
C. 

Likewise, passing from i to i involve cardinalities at most in Cm and no dependence on n. 

For the corollary, if i € J™ then i = i and \i\ = m. If moreover < . . . < i^, the number of 
ways to draw the subindexes is given by replacing occurrences of by 'C in (16), with 
= m\{n-rny. i whlch docs uot chaugc the order in n. Also there is only one way to intermix 
subindexes, because of the ordering constraint. 
□ 



Lemma 1.3 (Two sets matching indices [Corollary and complement]) 

Let J™ be the set of indices {(i\ . . € N, 1 < P < n, V ^ i^- if i ^ I}, and let /'™ be the 

subset of such that {i^ < . . . < i™}. 

Then forO<b<m, 

#{(ii, i2) e C X C : I n n i2 I = 6} = A^AlA^Z^C'^ = Oin'"'-") 

#{ (ii, Z2) e C X IT : I ii n i2 I = &} = C^C^C™.-^ = 0{n'"^-') 

In corollary, with P (resp. P' ) the mass probability on {I^Y (Tesp. (/'™)^j, -P(Ni ^ *2| = &) ~ 
P'{\ii n i2\ =b) = 0{n-'>) and P{\ii n i2\ = 0) = P'{\ii n = 0) < 1 - m^n"! + Cn''^. 

For ii,i2 € I™, the equivalence \ii n 12! = 6 <s=^ In U ^2! = 2m - 6 gives the link with the 
general case of lemma 1.2. 

Reusing the pattern of lemma 1.2 in a particular case : there are A™ ways to constitute 
ii, there are Af„ ways to draw b unordered integers from ii and A^Z^ ways to draw m — b 
unordered integers from {1, . . . , n} — ii. 

To constitute 12, intermixing both subindexes of b and m. — b integers is equivalent to 
highlighting b cells in a line of m cells ; there are ways to do so. On by definition, 
having drawn the b then m-b ordered distinct integers, intermixing is uniquely determined. 

Incidentally, one can check that Er=o <^n-mC^ = and that Er=o Cl^^™-m = C^^- 
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Dividing by (A™) or (C™) , both equivalent to n^™, gives the probabihties. Finally for the 
special case 6 = 0, use the fact that 

4^ = (1 --)... (1 — r) < (1 - -r 

□ 



Lemma 1.4 (Product of r kernels of degree m) 

Let r G N*. Let to > 1. Let he an independent, identically distributed sample of a 

random variable on M''. Let fi™ be the set of indices {(i^, . . . , i"*): P G N, 1 < i-' < n} . 

For i e n™, de^ne 

bik = Vjk{Xfl) . . . iPjk{Xl::i )$jfe(^i-l + l) • • • $jfe(Ximi+mJ. 

Let I he the set of distinct coordinates in i and let c = c{ii, . . . v) = |ii U . . . U v| be the overall 
number of distinct coordinates in r indices (ii, . . .ir) € {Cll^y . 

Then 

with Cd ~ Crf(«i, . . . v) < c the fraction of c corresponding to products with at least one ^{X) term 
and 1 < Cd < rUcir, < c — < ruir, 1 < c < (toi + TO2)r. 

Using lemma 1.1, one can see that the product at^ki ■ ■ - cii^kr, made of mr terms, can always 
be split into |?i U . . . U v| independent products of c{l) dependent terms, 1 < Z < |n U . . . U v|, 
with c{l) in the range from |zi| V . . . V |v| to mr and J2i c(0 = 

Using lemma 1.8, a product of c{l) dependent terms, is bounded under expectation by 
C2^('^(')~2) Accumulating all independent products, the overall order is (72V(™''-2|»iu...Jr|) 

For hi^ki ■ ■ ■ bi^k^ make the distinction between groups containing at least one ^{X) term and 
the others containing only (p{X^) terms. This splits the number |?i U . . . U id} into g<p^^ + g^. 
Let c^{l) be the number of (p terms in a product of c(Z) terms, mixed or not. 

On the g^^^ groups containing $ terms, first bound the product of c^{l) terms by C22''-pii) ^ 
and the remaining terms by C2t(c(0-cv>(0-2)_ Qn the g^ groups with only ip terms, bound 
the product by C22('=¥'W-2). 

The overall order is then 

C72^[(ESr^W-«.>(0)-29*.„] 2iEffr^v>(0 2i[(Ef=i^v'(0)-2sJ_ 

The final bound is found using Efli + Ef=r ^,^(0 = '^i^ ^^'^ J2^i=i c(0 - = ^d.r- 
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Rename Cd = g<s>,ip and c — Cd = g^p- 

As for the constraints, in the product of (mi + md)r terms, it is clear that <f> terms have to 
be found somewhere, so > 1, which also implies that c — = when c = 1 (in this case 
there are no independent group with only cj) terms, but only one big group with all indices 
equal). Otherwise Cd < mdr and c- Cd < mir since there are no more that this numbers of 
$ and (f) terms in the overall product. 



Lemma 1.5 (Meyer) 

Let Vj,,jGZ an r -regular multiresolution analysis of L2{M."') and let ip £Vo be the father wavelet. 

There exist two constant C2 > ci > such that for all p G [1, +00] and for all finite sum 
fix) = J2k ot{k)<Pjk{x) one has, 



See Meyer (1997) 

We use the bound under a special form. 

First note that if / e B,p^, ||/||«poo = \\Pjf\\p + sup^.2^«||/ - Pjf\\p so that ||/ - Pjf\\p < 
C||/IUpoo2-^^ So using (3), 

J2 ^ C2^''^'-''^^^\\Pjfrp < C2^-<^(i-f/2)2f-i( WfWl + 11/ - PjfWl) 

k 

<C'2Mi-P/2)2P-i(||/||P + c||/||Jp^2-^^^) 
<C2^d(^-p/^)\\f\\P^^. 

When applying the lemma to special coefficient Xjl ^ = q^^i . . . 0^^^ for some integers 
Pi,---,Pd,0<Pi<r with J2i=i Pi = I'', we use 



<C2i(2d-r)||^^ax/*^||^,^ 

so that even if some pi was zero, the result is a 2^ , which returns the effect of Yl,k'^ 1- 
□ 
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Lemma 1.6 (Path of non matching dimension numbers) 

Let r e N, r > 2. Let {(i^, . . . , i"): / e N, 1 < < n}. For i G f2^, let Ajk{Vi) = 

iPjk{X^i) . . . (fjk{X^a)- Let I be the set of distinct coordinates of i. 

In the product 

(EE;^E V(v'.))' = ^ E E ^nkAy^.)--^^.K{Vi.) 

j k iGO^ ii,...,ir-e(n^y jl---jr ki...,kr. 

unless |zi U . . . U < r, it is always possible to find indices (ii, . . . , ir) such that no two functions 

'Pjk 'Pjk' rnatch on observation number. 

Let c = |?i U . . . U ir\. For 1 < £ < n, let f®'' = {£,...,£)& n^. 

With r buckets of width d defined by the extent of each index fci . . . , fc^, and only c < r 
distinct observation numbers, once c buckets have been stuffed with terms V^^d, some 
already used observation number must be reused in order to fill in the remaining r - c 
buckets. So that r - c buckets will match on dimension and observation number allowing 
to reduce the sum to only c distinct buckets. 

Once c > r, starting with a configuration using V^»d, . . . V^»d we can always use additional 

observation numbers to fragment further the i^'^ terms, which preserves the empty inter- 
section between buckets. 
□ 



Lemma 1.7 (Daubechies wavelet concentration property) 

Let r € N, r > 1. Let (p be the scaling function of a Daubechies wavelet D2N. Let hk be the function 
on defined as a product of translations of ip 

hk{xi, . . . ,Xm) = fixi -k})...ip(Xm -k""), 

withk = {k\...,k'^) gZ"". 

Then for a Haar wavelet [X]/c hk{xi, ■ ■ ■ ,Xm)Y = YjU hk{xi, ■ ■ ■ ,XmY. 
For any D2N, 

(Y,\hk{xi,...xm)\) <(4Ar-3)'"('-i)^|/ifc(a;i,...ar„)r (17) 



With a Daubechies Wavelet D2N, whose support is [0, 2A'' - 1] with <^(0) = ip{2N - 1) = 
(except for Haar where (y?(0) = 1), one has the relation 

x^ip>{x- k)ip{x -e)=0, for 1^ - A;| > 2iV - 1 ; 

when k is fixed, the cardinal of the set |£ - fc| < 2A^ - 1 is equal to (4iV - 3). 
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So that, with k\,. . .kr denoting r independent multi-index, 

with A = {\k^l - kll\ < {2N - 1) ; ii, 12 = I . . .r ; ^1,^2 = 1 • • .m}. Once ki say, is fixed, the 
cardinal of A is not greater than (47V - 3)'"('"-i) and is exactly equal to 1 for Haar, when all 
k\ — . . . — — kf . 

For any Daubechies wavelet, and r > 1, using the inequality {\hki Y ■■■ \hkrY)^ <^Y^^ \hki T, 

{Y.\hu\Y< E \{\hk^'^ + ... + \huJ)m} 

k ki,...,kr 



J2 \hkJHA} + ...+ J2 \hkJHA} 

ki_ y * ' ' jk/p ki J ' ' ' jkf 



< (4iV-3)'"(''-^)^|/ife| 



Lemma 1.8 (rth order moment of ^jk) 

Let X be random variables on with density f. Let $ be the tensorial scaling function of an MR A 
o/L2(R'^). Let ajk = Ef^jk{X). Then for r e N*, 

Ef\<i>,kiX) - a,kr < 2^Ef\^,k{XW < 2'-2^-''(S-i)||/||oo||$||;. 
//$ is the Haar tensorial wavelet then also Ef ^jk{XY < 2^'^^^~i^ajk- 

For the left part of the inequality, (^Ef\^jk{X)-ajk\''^ " < (Ef\^jk{X)\'-^ " +Ef\^jk{X)\, and 
also Ef\^jk{X)\ < [Ef\^jkiX)ry [Efl)^ . 

For the right part, Ef\<i>jk{X)\'- = l^'^'-l^ ^ \^{2i x - k)\''f{x)dx < 2J''(i-i) ||/||oo||$||;. 
Or also if $ is positive, 

Ef<^jk{Xy = 2^('-i) I $(2^x - ky'-'<^jk{x)f{x)dx 
<2^(-i)||$|r^-ia,-fe. 
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